Keyword Weight Propagation for Indexing Structured Web Content

نویسندگان

  • Jong Wook Kim
  • K. Selcuk Candan
چکیده

When documents are atomically structured, it is possible to assign them keyword vectors to support indexing. Most web content, however, have non-atomic structures. These include navigational/semantic hierarchies on the web. Although they are especially effective for browsing, such structures make it hard for individual nodes to be properly indexed. This is because, in many cases, their contents have to be inferred from the contents of their neighbors, ancestors, and descendants in the structure. In this paper, we propose a novel keyword and keyword weight propagation technique to properly enrich the data nodes in structured content. In particular, our approach first relies on understanding the context provided by the relative content relationships between entries in the structure. We then leverage this information for relative-content preserving keyword propagation. Experiments show that we observe a significant improvement (10−15%) in precision with the proposed keyword propagation algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Content-Aware DataGuides for Indexing Large Collections of XML Documents

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this e...

متن کامل

ChemDig: new approaches to chemically significant indexing and searching of distributed web collectionsy

We describe an extension of the ht:==Dig robot-based internet indexing and search engine to include the retrieval of information included in a variety of molecular data formats as defined by chemical MIME types. This is achieved by invoking chemical meta-parsers, software agents designed to provide key meta-data information about the content of the external chemical files. This meta-data can in...

متن کامل

Effective Searching in Structured Data

Keyword search is the mechanism of choice for information discovery and retrieval due to the enormous success of Internet search engines. In fact, nearly half of Internet users perform at least one search daily. The keyword search paradigm regrettably does not extend to similar forms of content, particularly semistructured and relational data. Searching structured content is difficult because s...

متن کامل

Answering Structured Queries on Unstructured Data

There is growing number of applications that require access to both structured and unstructured data. Such collections of data have been referred to as dataspaces, and Dataspace Support Platforms (DSSPs) were proposed to offer several services over dataspaces, including search and query, source discovery and categorization, indexing and some forms of recovery. One of the key services of a DSSP ...

متن کامل

A Template-Based Approach to Keyword Search over Semantic Data

Keyword search is receiving a lot of attention not only in Web contexts but also in the database area. It is an easy way to allow inexperienced user to query systems without the need of knowing any specific language or how data is structured. As a matter of fact, the amount of data available, in the Web as well as in other systems, is constantly increasing. And, with the improvements and the si...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006